Over the past few years the computer has changed 
from a word/graphics processor to an all-round unit 
that, among others, is suitable for recording and 
reproducing audio and video signals. Particularly as 
regards audio reproduction, a number of different 
formats have come into being. This article descnbes 
the more important of these briefly and the curently 
popular MP3 in some detail. 
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zation. The rate at which the conver- 
sion takes place in the case ofa CD is 
44100 times per second for the left 
hand channel and the same number 
of times for the right-hand channel. 
Since the resolution of a CD is 
16 bits, the quantization of an audio 
signal results in a bit steam of about 
1.4 Mbit/s. This bit stream, together with 
additional data for eror conection 
and other information, is recorded on 
the disc. During playback, the 
process is simply reversed: the 
data on the CD are read bya 
laser and translated by a dig- 
ital-to-analogue converter 
(DAC) into analogue sig- 
nals. 

Audio signals in a 
computer are processed in a 
similar manner. The requisite electron- 
icSforthe ADC and DAC is contained 
on the sound card, while the hard disc 
normally functions as the storage 
medium. In a computer, the bit stream 
is written into a data file. 

The storing of the binary words 
requires a large amount of space: 
about 1 Mbyte for every six seconds of 


pos- 
sible applica- 
tions, and the diversity of 
types of computer that have come 
about in the past ten years or so, a 
multitude of file formats for storing the 
audio signals have ensued. 


introduction 

Today, a computer without sound is a 
rarity. Whether it concems a sophisti- 
cated computer game or the simple 
waming bleep accompanying various 
commands, sound is a must. Sound 


has become important even for 
Intemet users. Nowadays, you can lis- 
ten to radio broadcasts or download 
music via the net. Owing to all these 
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quantization 


Thanks to the introduction of the com- 
pact disc (CD) almost 20 years ago, 
many people know what is understood 
by digital sound. Briefly, for those who 
do not, the analogue (proportional) 
audio signal is translated by an ana- 
logue-to digital converter (ADC) to a 
large number of binary words. This 
process is called quantization or digiti- 


sound of CD quality. There are several 
means of limiting this space require- 
ment. One is lowering the sampling 
frequency from 44.1 kHz to, say, 10 kHz 
which reduces the bandwidth. An- 
other is recording the signal in 
mono(phonic) instead of in stereo 
(phonic) format, which degrades the 
quality. A third is lowering the resolution 
from 16 bits to, say, 8 bits, which tends 
to degrade the signal-to-noise ratio 
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and increase the distortion. A fourth, 
widely used, method is compressing 
the signal during recording and 
expanding it during playback. 

Almost all computer systems (Atan, 
Unix, Intel, and so on) have their own 
specific file format, which has resulted 
in the same confused situation as 
exists in the case of pixel formats. That 
is to say, there is a bewildering multi- 
plicity of formats, which does not help 
the consumer. This situation is not 
improved by the addition of yet more 
formats via the Intemet. 

Table 1 gives an overview of the 
most frequently encountered formats 
together with a bref description. 


to compress 
or not to compress 


Compressing an audio file is fairy 
complex because certain similarities 
between the large number of samples 
are not easily determined. 
Consequently, there are only a few 
lossless compression methods for 
audio signals. One of these is the 
ADPCM standard (in Windows) for the 
popular WAV format. 

It is, however, possible to compress 
audio signals if some loss of detail is 
acceptable, as in the processing of 
digital images. In the case of digital 
images, the compression method is 
laid down by the Joint Photographic 
Expert Group (J PEG), a subgroup of the 
Joint Technical Committee 1 (J CTL) of 
the ISO (International Standards 
Organization - consultant to the United 
Nations) and the IEC (Intemational 
Electrotechnical Commission). Another 
sub-group of the JCTl, the Motion 
Pictures Expert Group or MPEG, has 
laid down a standard for audio com- 
pression: the MPEG format. Currently, 
MPEG-1 Layer 3 (MP3) is the popular 
format for Intemet users. 


real time audio 


In the early days of audio in comput 
ers, the entire audio file had to be 
loaded into the computer memory 
before it could be played back. This is 
particularly irksome during download- 
ing of files via the Intemet. 

With the advent of fast modems it 
has become possible for techniques to 
be developed that enable sound files 
to be played back during the down- 
loading. A particularly worthwhile con- 
tribution to this has been made by a 
process called ‘AOD’ (Audio On 
Demand), ‘real time audio’ or ‘strea m- 
ing audio’, a product of the company 
RealAudio. 
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Table 1. 
Overview of the most frequently encountered audio formats 


AIF A format originally developed for the Macintosh. It is quite popular on the 
Internet and offers many different sampling rates and resolutions. Netscape 
and Microsoft browsers have no difficulty working with it 

AU Originates from NeXT and SUN and was popular on the web and is still in use 
today. There are a number of variants, but normally the data are compressed 
to an 8-bit format with the u-law standard. Most browsers can work with it. 

ES A streaming audio format from EchoCast. Players are available for Windows 
and Macintosh. 

IFF An Amiga sound format that handles only 8-bit mono sound; it allows a free 
choice of the sampling rate. 

LCC A high-compression format (ratios up 1:50 are possible) currently available 
for Windows only, but other versions are under development 

MID 


The MIDI format is not a true audio format, but a standard for exchanging 
control data between electronic musical instruments. Browsers can handle 
MIDI via a plug-in. 


MOD An original Amiga format that is reminiscent of MIDI. An MOD file contains a 


MP3 


bank with samples and instructions how these samples are to be played. 
Requires a MOD plug-in. 
The most popular current audio format. Various plug-ins are available. 


RA, RAM, RPM A popular streaming audio format on the Internet from RealAu- 


SND 


STR 
VDO 


VMD 


VOC 


WAV 


XDM 


dio. Plug-ins are available for virtually all platforms. 

Apple, Amiga and Tandy have used this suffix for sound files. Some variants 
are compatible with the AU format. 

A format for professional sound processing on the Macintosh. 

Another streaming audio format. It requires a plug-in as such as a VDO live 
player. 

A streaming audio format named Internet Wave. It comes with a free 
encoder/decoder for Windows. 

The Voice format, which is a development of Creative Labs, maker of the well- 
known Soundblaster cards. 

The much used wave format has become well known since the introduction 
of Windows. It offers many different sampling rates, resolutions and com- 
pression factors. 

An MPEG format from StreamWorks for streaming audio. There is a special 
player for Windows. 


MP3: good quality 

with high compression 
Currently, MP3 is the most popular of 
all audio formats. In a very short time, 
this audio compression protocol has 
gained a strong position as music 
compressor Many modem computers 
come with MP3 encoding and decod- 
ing software installed. Also, large num- 
bers of MP3 files are being disseminat- 
ed via the Intemet and there are 
already compilation CDs that contain 
MP3 files. Some manufacturers have 
started to make available an MP3 
Walkman™: a solid-state portable CD 
player which uses a flash memory as 
camer. 


developed for DAB 


As a contributor to the pan-European 
Eureka 147 project (to develop a ter- 
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restrial digital audio broadcasting sys- 
tem - DAB), the Geman Fraunhofer 
Institut für Integrierte Schaltungen 
(Fraunhofer Institute for Integrated 
Circuits) has developed a codec 
(coder/decoder) for DAB in which the 
perception of the listener plays an 
important role. 

The relevant algorithm takes into 
account certain properties of human 
hearing and on this basis detemines 
whether a certain aspect of the sound 
in a piece of music is likely to be per- 
ceived by the listener or not. 
Depending on this likelihood, it further 
determines whether the relevant data 
should be included in the bit stream or 
not. This results in a redundancy com- 
pression system that enables a sub- 
stantial data reduction without 
degrading the sound 

The algorithm has been further 
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Figure 1. Each MPEG-1 layer makes use of a 32-bit filter bank Whether the signal 
becomes masked or not is determined after it has been quantized. 


enhanced by IIS in cooperation with 
the University of Erlangen and has 
been accepted as ISO MPEG-1 
Layer-3 (IS 11172-3 and IS 13818-3). 


layer1, 2 or3 


Without data reduction, audio signals 
contain 8-bit or 16-bit wide samples 
which are taken ata rate that is twice 
as high as the highest frequency in 
these signals. 

It has already been stated that the 
digitization of an audio signal results in 
a bit steam of about 1.4 Mbit/s. A 
state-of-the-art compression system 
can compress this by a ratio of 1:12 
without an audible degradation of the 
sound. Reduction ratios of up to 1:24 
are possible and still resultin a sound 
quality that is superior to that when the 
sampling rate or the resolution are 
reduced to obtain a comparable 
compression. 

The MPEG-1 standard describes 
three layers of compression: Layer1, 
Layer2, and Layer3. All three are 
capable of producing sound of near- 
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CD quality. Table 2 gives an overview 
of the characteristics of these layers. 
The definitions within the nom refer 
only to the encoder and the data for- 
mat used. This information enables 
manufacturers to design a decoder to 
their own requirements. 

If stereophonic reproduction is not 
used, and if furthermore a restricted 
bandwidth is acceptable, even higher 
compression ratios can be used. The 
highest of these, Layer 3, uses the low- 
est bit rate and produces the best 
sound quality. 

The three codecs are hierarchically 
compatible, which means that a 
decoder for Layer 3 can also be used 
with Layers1 and 2. This is, however, 
not possible the other way around. 

The higher the number of the layer, 
the more complex the encoder 
becomesand the higher the compres- 
sion ratio that can be used. 

Table 2 also shows the possible 
compression ratios attainable with 
Layer 3 and what these can be used 
for. 

Listening tests with mixed audiences 


show that the performance of Layer 3 
remains excellent with a compression 
ratio of 1:12 - which is associated with 
a bit steam of 64 kbit/s per audio 
channel. If for certain applications the 
bandwidth can be limited to 10 kHz, 
good stereo reproduction is possible 
with a compression ratio of 1:24. 


uniform structure 


All three layers have the same struc- 
ture. Their encoding technique is 
known as perceptual noise shaping or 
perceptual subband transform cod- 
ing. The encoder analyses the spectral 
components of an audio signal with 
the aid of a filter bank (see Figure 1) 
and uses a psychoacoustic model to 
determine the discemible noise levels. 
Subsequently, the information is quan- 
tized and encoded in a manner which 
ensures that two important conditions 
are taken into account: the maximum 
bit stream and the masking effect. 

All three layers use the same filter 
bank with 32 subbands. They all permit 
sampling rates of 32 kHz, 44.1 kHz, 
and 48 kHz, and are capable of work- 
ing with bit streams of 32 kbit/s or high- 
er 


background 


To achieve a substantial reduction of 
the requisite digital bandwidth, 
MPEG-1 Layer3 uses several tech- 
niques and short-cuts. The most impor- 
tant of these are: 


e lower threshold of hearing 
e masking effect 

e a store of bytes 

* joint stereo 

e Huffman coding 


Lower threshold of hearing 

Research has shown that the lower 
threshold of human hearing is not lin- 
ear: it peaks between 2 kHz and 5 kHz. 
Its properties are described by 
Fletcher and Munson. It is not neces- 
sary to encode sound that lies below 
the threshold since the listener cannot 
hear it. 


Masking effect 
Use is made of the fact that human 
hearing does not perceive weak 
sounds that are totally or partially 
masked by (much) stronger ones. 
Research shows that owing to the 
masking certain sounds need not be 
encoded, which saves quite a lot of 
space. All MPEG-1 Layer3 encoders 
therefore contain a psychoacoustic 
model in which the properties of 
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human hearing are incorporated. 


Store of bytes 

It often happens that a passage of 
music cannot be encoded with the 
available bit rate. The quality of the 
sound must then be adapted tem- 
porarily to enable the bit stream to 
match the capacity of the digital 
channel. MPEG-1 Layer 3 uses a buffer 
that provides some additional capac- 
ity in such circumstances. The buffer is 
emptied when sound is encoded ata 
bit rate lower than that available in the 
channel. 


J oint stereo 

Many small stereo hi-fi systems use a 
common woofer. In spite of this, the lis- 
tener gets the impression that the 
sound does not emanate from this 
loudspeaker, but rather from the satel- 
lites. Research shows that below a cer- 
tain frequency the human ear is not 
able to judge from which direction the 
sound comes. Compression tech- 
niques can make use of this property 
by not including stereo information 
below a certain frequency. This means 
that below that frequency the signal is 
encoded in monophonic form only. 


Huffman code 

The encoding of MPEG-1 Layer 3 uses 
a classical technique: the Huffman 
code. Thisis used afterthe actual data 
compression has taken place to 
encode the digital information. It is, 
therefore, not a compression system 
but a very efficient encoding tech- 
nique. The Huffman algorithm gener- 
ates a code of variable length and a 
whole number of bits. Important sig- 
nals are allocated a short code, less 
significant ones, a longer code. 

Since Huffman codes have a specif- 
ic header, they can be decoded per 
fectly in spite of the variable length. 
Decoding is very fast since use can be 
made ofa table. The technique gives 
a space saving of some 20 percent. 

The Huffman technique is an ideal 
complement to the perception- 
dependent compression. In passages 
containing many frequencies simulta - 
neously, the perception-dependent 
encoding provides an appreciable 
reduction by eliminating masked sig- 
nals. Since few identical signals occur, 
the Huffman code has little effect. 

During passages with few different 
sounds, not many masking effects 
occur. This is when the Huffman code 
saves considerable space since there 
is much redundant information, Such 
passages can, therefore, be repre- 
sented by short codes. [992001] 
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Table 2. The modes of MPEG-1 audio 


Compression factor method comments 

1:4 via Layer 1 stereo signal results in a bit stream of 
384 kbit/s 

1:6 - 1:8 via Layer 2 stereo Signal results in a bit stream of 
256-192 kbit/s 

1:10 - 1:12 via Layer 3 stereo Signal results in a bit stream of 


128-112 kbit/s 
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There is much information and many links at www.mpeg.org. 


Table 3. Facilities available with MPEG-1 Layer 3 


Sound quality Bandwidth (kHz) Mode Bit rate (kbit/s) 
Compression factor 

Telephone 2.5 mono 8 
1:96 

Better than shortwave 4.5 mono 16 
1:48 

Better then medium wave 75 mono 32 
1:24 

FM radio 11 stereo 56-64 
1:24 - 1:26 

Near-CD 15 stereo 96 
1:16 

CD >15 stereo 112-128 
1:12 - 1:14 
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